Automatic Association of Web Directories to Word Senses

نویسندگان

  • Celina Santamaría
  • Julio Gonzalo
  • M. Felisa Verdejo
چکیده

We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically-related senses and detect sense specializations. The algorithm is evaluated for the 29 nouns (147 senses) used in the Senseval 2 competition, obtaining 148 (word sense,Web directory) associations covering 88% of the domain-specific word senses in the test data with 86% accuracy. The richness of Web directories as sense characterizations is evaluated in a supervised Word Sense Disambiguation task using the Senseval 2 test suite. The results indicate that, when the directory/word sense association is correct, the samples automatically acquired from the Web directories are nearly as valid for training as the original Senseval 2 training instances. The results support our hypothesis that Web directories are a rich source of lexical information: cleaner, more reliable and more structured than the full Web as a corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Association Of Web Directories With Word Senses

We describe an algorithm that combines lexical information (from WordNet 1.7) with Web directories (from the Open Directory Project) to associate word senses with such directories. Such associations can be used as rich characterizations to acquire sense-tagged corpora automatically, cluster topically related senses, and detect sense specializations. The algorithm is evaluated for the 29 nouns (...

متن کامل

Automatic Generation of Web

This is a preliminary report of our study on automatic generation of web directories. We now concentrate on generation of web directories for speci c categories such as aquarium, zoo, and museum. From only a category word (e.g., aquarium), our system generates the directory of the category's instances, in which instances are classi ed by geographical region, without any human interaction.

متن کامل

Enriching Ontology Concepts Based on Texts from WWW and Corpus

In spite of the growing of ontological engineering tools, ontology knowledge acquisition remains a highly manual, time-consuming and complex task. Automatic ontology learning is a well-established research field whose goal is to support the semi-automatic construction of ontologies starting from available digital resources (e.g., A corpus, web pages, dictionaries, semi-structured and structured...

متن کامل

Semantic is Beautiful: Clustering and Diversifying Search Results with Graph-based Word Sense Induction

Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity issu...

متن کامل

Building Web Directories in Different Languages for Decision Support: A Semi-Automatic Approach

Web directories organize voluminous information into hierarchical structures, helping users to quickly locate relevant information and to support decision-making. The development of existing Web directories either relies on expert participation that may not be available or uses automatic approaches that lack precision. As more users access the Web in their native languages, better approaches to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 29  شماره 

صفحات  -

تاریخ انتشار 2003